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A survey is given of some recent calculations of univariate and multivariate probability density functions 
(pdfs) of structure factors used to interpret crystallographic data. We have found that in the presence of 
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form of Edgeworth or Oram-Charlier secies cart be quite unreliable, and in these cases the more exact, but 
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Few scientific disciplines depend so heavily on tech- 
niques based on the central limit theorem and associated 
expansions in orthogonal polynomials as does crys- 
tallography. Ever since the pioneering work of Wilson 
[1.2], 1 and Karle and Hauptman [3-5], the central limit 
theorem has played a vital role in translating crys- 
tallographic scattering data into structural information 
and, indeed, it is built into many computer routines for 
this purpose. As we will show,, when the central limit 
theorem is applied to data from unit cells with a consid- 
erable variation in the atomic weights of the constituent 
atoms it can lead to serious qualitative errors. That this 
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is true is well known to crystallographers who have 
made heavy use of Edgeworth and related expansions to 
correct zeroth order approximations based on the cen- 
tral limit theorem [5,6], It is not generally appreciated, 
however, that serious errors can persist even with these 
correction terms, provided that atomic heterogeneity is 
sufficiently great This suggests the value that may be 
attached to exact results when these are available and 
are readily computed. This paper reports on recent ef- 
forts we and several collaborators [7-1 1] have made in 
this direction. 

Two general classes of probabilistic methods are used 
to deduce structural information from radiation in- 
tensity diffracted from crystals, the so-called intensity 
statistics and direct methods of phase determination. In 
order to make this exposition self-contained, we will 
sketch how such information can be derived from data 
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on intensities in a particularly simple case, and refer the 
interested reader to two monographs that give detailed 
accounts of these subjects in more general cases [12,13]. 
The arrangement of atoms in a unit cell of a crystal is 
most often restricted by the space group to which the 
crystal belongs [14], and in the general case, only the 
arrangement within the asymmetric part of the cell 
needs to be determined. The intensity of the diffracted 
radiation can be represented in terms of structure factors 
F(h), where the vector h and its components {h,k,l), the 
orders of diffraction, specify the geometric relation be- 
tween incident and scattered beams and their relative 
orientation to the basis vectors of the lattice of the dif- 
fracting crystal [14]. The structure factors are Fourier 
coefficients of the (periodic) density function of the scat- 
tering matter, and both their magnitude and phase are 
required in order to reconstruct the density — i.e., the 
actual atomic arrangement. Thus, F(h) is in general a 
complex quantity, which we write as 
F(h)—A (h)-f ifi(h). The function F(h) can be expressed 
as a sum of contributions from individual atoms in the 
unit cell as 



F(h)=2j5exp(2mh,r J f)=2/;exp(ifly) 



(1) 



where r, is the location of atom j, the fj are so-called 
scattering or form factors which can be approximated, 
in the normalized-structure-factor representation (see 
below), by the atomic numbers of the corresponding 
atoms, and 0j=2it)\-Tj. The space of h is surveyed by 
varying the orientation of the crystal with respect to the 
incident beam. 

Since F(h) is a complex quantity, it can be represented 
as a vector in a plane which is the sum of n vectors, the 
/"'th being/; exp (iOj). The fundamental difficulty faced 
by crystallographers is that only the magnitude | F j is 
measurable (although some recent work may change 
this situation [15]), and the phase of F(h) must be in- 
ferred indirectly. To do so, one can establish a corre- 
spondence between the vector F and a random walk 
first studied by Pearson [16]. Using theorem of Weyl 
[17], one can show that if the components of r are ratio- 
nally independent, i.e., there exists no vector of integers 
m such that m-r= integer, then the set of angles, {#,}, 
can be regarded as consisting of independent random 
variables, each of which is uniformly distributed over 
the interval (0,1)[17]. Thus the properties of the.F(h) can 
be determined using probabilistic methods, as was first 
pointed out by Wilson [1,2]. 

For a typical and important case in which proba- 
bilistic techniques allow one to derive structural infor- 
mation, consider how one can distinguish between cen- 
trosymmetric and noncentrosymmetric (space groups 
P\ and PI, respectively) unit cells on the basis of 



intensity statistics alone. A centrosymmetric unit cell is 
one in which, for every atom located at Tj, there is an 
identical one at — Tj. Consequently if we write 
F=A +iB where 



A=2fj cosOj , B=2fjsin6j 



(2) 



it follows that 5=0 by symmetry in the presence of 
centrosymmetry. When the unit cell is 
noncentrosymmetric B is not necessarily equal to 0. 
Hence the value of F can be represented as a one- 
dimensional random walk in P\ and by a two- 
dimensional random walk in P 1. In what follows we will 
use the physics notation that " <> " denotes the average 
of the variable contained in brackets. It will also prove 
convenient to work with the normalized structure fac- 
tor E=F/<\F\ 2>i which, since <F>=0, has the 
property that < \E 2 \ > = 1, Wilson's argument uses the 
central limit theorem to deduce the pdf of scattered 
intensities. In PI, for which B =0, the form of the pdf of 
E that follows from the central limit theorem is 



/>(|£|)=(2/ir)*exp(-JEV2). 



(3) 



The corresponding pdf for the two dimensional case 
for unit cells without a crystallographic center of sym- 
metry is 



/.(|£|)=2|£|exp(-|£! 2 ). 



(4) 



The qualitative difference between eqs (3) and (4) thus 
allows the experimental distinction to be drawn purely 
on a comparison of intensity data with the two forms for 
the pdf. 

Notice, however, that the use of the central limit the- 
orem presupposes the validity of certain assumptions, 
the major one of which is the presence of a large number 
of atoms in the unit cell and the second of which is that 
the /J appearing in eq (1) should not exhibit too great a 
heterogeneity. The first of these assumptions holds for 
most crystalline materials of interest, but the second 
may be violated particularly when there are a small 
number of atoms that are considerably heavier than the 
majority of atoms comprising the molecule. When this is 
the case it is customary to replace, e.g., eq (3) by the 
Edgeworth series 

p(\E\)=(2/ir)l expi-E 1 /Zyil+2aM„(\E\/V2)} (5) 

n 

where the n 'th coefficient, a„ is expressible as a linear 
combination of the moments of A in eq (2) and H n (x) is 
the n 'th Hermite polynomial. These are readily calcu- 
lated for the simpler space groups [18], and all space- 
group results are available for fourth, sixth, and eighth 
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momenfe[l9,2G]. Furthermore, the Edgeworth expan- 
sion may also not be too useful in the presence of ex- 
treme heterogeneity. This is illustrated in figure 1 in 
which the asymmetric unit of a cell in P\ consists of 14 
carbon atoms and one uranium atom, with a ratio of/s 
approximately equal to 15i With or 2 moment the 
Edgeworth series fails to reproduce the maximum and 
the 4 and 8 moment approximation locates the maximum 
quite far from its actual position. It is therefore desirable 
to have an exact easily computable representation for 
the pdf which is robust with respect to changes in 
atomic heterogeneity. 

Just such a representation was first suggested by 
Barafcat in a study of the freely jointed chain as a model 
for polymer configurations [21] and of laser speckle [22]. 
Let us write gi=f } /(2f$ so that 



E =2&- exp(i8 I )=A +i£ 



and let us set 



S=*2gj 



(6) 



CO 



so that — S<,A,B*zS. As an example we consider the 
case of a centrosymmetric unit cell for which 5=0. The 
pdf of A, g(A\ has the property that it can differ from 
zero only in the interval S*>A 2 . Within this interval we 
will expand g(A) in a Fourier series-. 



where 



°-„ = I g(A)co&<~-)aA=z j 



g{A)cas<^-)aA 



= C(?f) 



(9) 



where C((a) is the characteristic function generated by 
g(A). The Fourier series in eq (8) corresponds to a sam- 
pling theorem [23] for pdfs with a compact support. 
When the unit cell is noncentrosymmetric so that 5=0 
in general, it is more convenient to expand the pdf of 
]E\ ={A 1 +B i ^ in a Fourier-Bessel function series 



P i\E\y. 



J (=0 



(10) 



where the jj are successive roots of J c (y)=0 and the 
coefficients, D Jt are 



D y = C(y/S )//?(y;) 



01) 



again written in terms of the characteristic function. 
Two questions that require an answer relate to the 
advantage of representations such as those in eqs (8) or 
(10) and the feasibility of numerical evaluation of the 
series. In the absence of atomic heterogeneity, or when 
there ls a very large number of atoms in a unit cell, the 
central limit results are perfectly adequate for crys- 



< 




en 




Figure la-la) Aprjioainatioas to the exact pdf s'A) (devoted Vy the solid lbe> for a Knit wtt in spjLCaa group PL cansstmg of :4 c 
atoms and a single uranium atom (atomic weight ratio 15 4=1) in the asymmetric unit. For convenience A mal has been set equal to I. 
that the pdf is symmetric around A =0. The approximations are a Gaussian (— ) and the Gaussian corrected by two moments (-- 
Approximations to the same pdf as in figure la by an Edgeworth series using 4 moments ( — ) and 8 moments (—.). 
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tallographic applications. However, when there are 
fewer than about 40 atoms in the unit cell combined with 
one or two outstandingly heavy atoms, one has to con- 
volve the Gaussian with the appropriate pdf for the 
heavy atoms [12]. In principle, using the series of eqs (8) 
and (10) finesses this difficulty, provided that the con- 
vergence properties are not overwhelming. In practice, 
in the case of intensity statistics we have found no prob- 
lem in evaluating the Fourier ot Four ter-Bessel function 
series, requiring no more than about 40 terms for the 
most extreme amounts of heterogeneity, and many 
fewer terms in the absence of heterogeneity. The evalu- 
ation of the analogous series for direct methods can 
present much tougher numerical problems, as we will 
see. Finally, a problem not so far discussed is the ease 
with which expressions for the characteristic function 
can be calculated. We have found that in is not too 
difficult to evaluate the characteristic function for all 
but a handful of space groups, whose structure factor is 
found in the International Tables [24], As an example, let 
us write the structure factor for PI as 



a/2 

A =2 2 gjca$0f 



(12) 



where the^. are known and the &j are uniformly distrib- 
uted in (0,2tt). The characteristic function is 



it/2 



C(»)=<exp(2i"u 2cos0ji)> 



(13) 



n/2 

= n c,(a) 



Again the numerical problems associated with this rep- 
resentation were not severe and allowed us to generalize 
the theory first presented by Rogers and Wilson for the 
equal-atom case [9,25]. It is possible though alge- 
braically messy to generate the orthogonal polynomials 
corresponding to the Rogers- Wilson pdf, but the Fou- 
rier series representation is relatively straightforward. 
One can also analyze partially bicentric structures using 
the same techniques [10). 

Our present development of Fourier representations 
of crystallographic pdf s has led us into the examination 
of direct methods in which one is interested in the joint 
pdf of several, usually correlated, structure factors. One 
of the simplest examples of these is the so-called 2, 
relationship [13,26], in which one uses the joint pdf of 
E(V) and 2i(2h) to determine the probability that the 
phase of £(2h) is positive given a knowledge of £"(h)| 
and li?(2h)|. For simplicity we consider structures in 
PI letting EQi)**E and £(2h)=G . Then 



4lJ f=-oo 3= -a 



cos^-jcos^— j 



Jtrr irs \ 
'\S' s) 



(17) 



where 



CT&^tJi) = (exp(i 0-& + i ahG) > = II Cj (co i,ei-J ( IS) 



in which since 



where 



nil 

=22 gj cos(2^) 



C jr -(n>)= <exp(2/ftigy cosS)) 



1 f" 

=r— exp(2ia>gj cos9)a&—J£2eJgj)- 



(14) 



Other examples merely test one's ability to evaluate inte- 
gials. For example, we have recently examined the Fou- 
rier representation of the pdf of the intensity for a unit 
cell in PI in which there is an auxiliary or non- 
crystallographic center of symmetry located at A, so that 
a single atom located at r ; generates one at — r, and 
— r,±2d [9,10], In this case one can show that 



n/4 



A(h)=4X g } cos(2irh-d)cos[2-?rh-(r / -d)] 
and the corresponding characteristic function is 



C(a>) 



■J fit n/4 

=- en. 



(II J&Augj cart))E?& 



(15) 



(16) 



Cj(ai,o) 2 )=~ j exp[2fg/ai|CosS+te> 2 cos2e)Jfii r e 



=R) + &j 



(19) 



where R, and 7, can be expanded in terms of Bessel 
functions as 



Jt/f t i„w 2 )=/ (2g / a) l V (2|>ai 3 )+2 2 



fll = ] 



{^TJ, m {2gj^V^{2gj^) 



(20) 



/,(»„ co 2 ) = 2 J fi (_ 1)»+ l J tm+ &g, o 1 )J 2m+! (2g y .< W2 ). 

From eq (18) it follows that C(« :r o 1 )=B(«B 1 ,«0 
■+i/"(<o. f tf 2 ) where H and I cm be computed from the R t 
and Ij. The probability that G is positive given E can 
now he written exactly as 
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/> + (2hlh)=Kl+?) 



(21) 



where 






+ 

(22) 



The exact eq (21) should be compared to the much 
simpler approximation furnished by the use of the cen- 
tral limit theorem [26], 

^ + (2hlh)~i[l+tanh(^ |G|Cff 3 -l))] (23) 



where 



nil 

= 22 gj 



(24) 
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Figure 2-A graph of the exact expression for p +(2h |h), the proba- 
bility that the phase of E(2b) is positive, as a function of E (+ -f +) 
compared to the approximation provided by eq (23) (the solid 
curve) for a molecule with the assumed composition C30KJ2 in the 
asymmetric unit, The magnitude, | £{2h) | , was chosen equal to 1.75 
for this example. 



Although eq (23) and generalizations of it are much used 
in the crystallographic literature, there has been no real 
test of its accuracy in the presence of heterogeneity, 
since until now there has been no attempt to calculate 
the exact pdf. A comparison of the result of evaluating 
eq (21) with that obtained from eq (23) for an assumed 
composition C 3 oKrj in the half unit of a PI structure is 
shown in figure 2 for G = 1.75 [1 1]. A substantial differ- 
ence between the two predictions is immediately evi- 
dent. Further evidence of the inaccuracy of eq (23) in 
the presence of atomic heterogeneity is provided in fig- 
ure 3 where we examine the effects of the variation in 
atomic weights for a unit cell in which the half unit is 
C 30 Xj, where X varies. In the absence of heterogeneity 
eq (23) provides perfectly satisfactory results, but its 
utility decreases considerably with an increase in the 
atomic weight of the X atom. 

We are presently examining the analogous properties 
of the 2 2 relationship, in which one determines proba- 
bilistic relations between phases from properties of the 
joint pdf of E(h), E(k) and E(— h-k), which requires 
the evaluation of higher order Fourier series by the 
same basic techniques. While this investigation is very 
similar both in spirit and results to those for 2j discussed 
in the last paragraphs, it appears to be much more diffi- 
cult to evaluate the series for the pdf of the three-phase 



invariant, <t>, defined in terms of the phases of the triplet 
of structure factors E(h), £(k), £(-h-k), by 



* = d>(h) + d»(k) + d>(~ h - k>. 



(25) 



To convey some notion of the difficulties we point out 
that the characteristic function to be evaluated is 

C;(w) = (exp {ig J ((aiA i + (!i 1 B i + ti)y42 + ai^Bi+w s A 3 

where E(h)=A { +iB lt E{V)=A 2 +iBi, £(-h-k)= 
Ai+iB } . A detailed evaluation of C;(w) results in the 
expression 

*,(w)= n J o (fj<>0+2 2(- i) m n J ltK (f^) 



m =0 b=l 



C(w)=i?(w)+j7(w)=nC,(w)=n(i? ; +// J ) . (27) 



The resulting expression for the pdf of <S? is in terms of 
sevenfold Fourier series, each coefficient of which is an 
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2[ relationship as indicated earlier and are presently 
considering more complicated crystaHographic tech- 
niques. Furthermore, as the processing of crys- 
tallographic data becomes more and more automated it 
becomes increasingly attractive to have exact, rather 
than approximate formulae in the computer. We hope, 
in the coming years, to explore the feasibility of doing 
this for a variety of techniques, as well as contributing to 
the development of further ones based on the avail- 
ability of exact representations. 
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Figure 3-This figure shows the effects of heterogeneity on p + (2h | ti), 
as a function of the ratio of atomic numbers, p=Zi/Z c for a mole- 
cule with the composition C30X2. The 4- + -f's are the exact results 
and the solid lines are the approximation of eq (23). The values 
chosen are (a) |£(h)| = \EQh)\ =1,5, (b) \EQ0 = \E<?V}\ =2.0, 
(c) \E(h)=\E(2h)\ =2.25. Note that the approximation is always 
on the conservative side. It is not known whether this is always true. 



infinite series of the form shown in the last equation. 
Whether the resulting calculations can be made in a 
reasonable amount of time remains to be seen, but the 
difficulties to be overcome are exemplified by this prob- 
lem. 

A final word is in order about the philosophy behind 
the series of projects that we have undertaken. It would 
hardly be sensible to want to eliminate methods based on 
the central limit theorem that have served crys- 
tallographers so well in the past. However, it is useful to 
establish the limitations of these methods by having 
more exact representations available. Indeed we have 
explored such limitations in the case of tests based on the 
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This interesting paper by Drs. Weiss and Shmueli 
represents a substantially exact solution of a problem 
that has concerned crystallographers for more than 35 
years, the analysis in terms of atomic structure of x-ray 
diffraction data. (Similar information can be obtained 
from the diffraction of electrons and neutrons, but, for 
reasons that are both experimental and theoretical, this 
information is mainly used to supplement that obtained 
from x-ray diffraction, which remains the basic tool of 
the structural crystallographer.) The observed intensity 
in x-ray diffraction is given by 

I=SL\F(h)\ 2 , 

where S is a scale factor, L is a geometrical factor, and 
F(h), commonly called the structure factor, is the Fou- 



rier transform of the electron density in a crystal. It may 
be written in the form 



F(h)= Jp(r)exp(2OT'h-r)dr. 



The density function, p(r), in a crystal is periodic in 
three dimensions, so that it can be represented as a con- 
volution of a function consisting of S Functions located 
at the nodes of a space lattice and a density defined in a 
small region known as a unit cell. Because of the period- 
icity the Fourier transform has appreciable values only 
at the nodes of a lattice in transform space, called by 
crystallographers the reciprocal lattice. Because it is a 
physical quantity, p(r) is non-negative, and, further- 
more, because a crystal is composed of atoms, it can be 



513 



